This dataset contains daily and hourly ridership levels on the Washington, DC Capital Bikeshare with weather information and additional context about the date. The dataset was obtained from the UCI machine learning repository.
The dataframes are made up of the following columns:
The dataset had no missing values, so the only cleaning needed was converting the date character to date objects and the categorical variables from numeric to factors.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation: size
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
##
## Call:
## lm(formula = cnt ~ . - cnt - instant - dteday - casual - registered,
## data = myData_daily)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3944.7 -348.2 63.8 457.4 2912.7
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1485.84 239.75 6.198 9.77e-10 ***
## season2 884.71 179.49 4.929 1.03e-06 ***
## season3 832.70 213.13 3.907 0.000102 ***
## season4 1575.35 181.00 8.704 < 2e-16 ***
## yr1 2019.74 58.22 34.691 < 2e-16 ***
## mnth2 131.03 143.78 0.911 0.362443
## mnth3 542.83 165.43 3.281 0.001085 **
## mnth4 451.17 247.57 1.822 0.068820 .
## mnth5 735.51 267.63 2.748 0.006145 **
## mnth6 515.40 282.41 1.825 0.068423 .
## mnth7 30.80 313.82 0.098 0.921854
## mnth8 444.95 303.17 1.468 0.142639
## mnth9 1004.17 265.12 3.788 0.000165 ***
## mnth10 519.67 241.55 2.151 0.031787 *
## mnth11 -116.69 230.78 -0.506 0.613257
## mnth12 -89.59 182.21 -0.492 0.623098
## holiday1 -589.70 180.36 -3.270 0.001130 **
## weekday1 212.05 109.49 1.937 0.053187 .
## weekday2 309.53 107.13 2.889 0.003982 **
## weekday3 381.36 107.48 3.548 0.000414 ***
## weekday4 386.34 107.53 3.593 0.000350 ***
## weekday5 436.98 107.44 4.067 5.30e-05 ***
## weekday6 440.46 106.56 4.133 4.01e-05 ***
## workingday1 NA NA NA NA
## weathersit2 -462.54 77.09 -6.000 3.16e-09 ***
## weathersit3 -1965.09 197.05 -9.972 < 2e-16 ***
## temp 2855.01 1398.16 2.042 0.041526 *
## atemp 1786.16 1462.12 1.222 0.222261
## hum -1535.47 292.45 -5.250 2.01e-07 ***
## windspeed -2823.30 414.55 -6.810 2.09e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 769.2 on 702 degrees of freedom
## Multiple R-squared: 0.8484, Adjusted R-squared: 0.8423
## F-statistic: 140.3 on 28 and 702 DF, p-value: < 2.2e-16
## Warning in predict.lm(model, newdata = testData): prediction from a
## rank-deficient fit may be misleading
##
## Call:
## lm(formula = cnt ~ . - instant - casual - registered, data = trainData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3463.5 -359.0 65.5 447.6 2862.0
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1485.47 272.21 5.457 7.31e-08 ***
## season2 822.91 198.17 4.153 3.81e-05 ***
## season3 724.46 232.25 3.119 0.001907 **
## season4 1616.73 193.22 8.367 4.80e-16 ***
## yr1 2032.14 64.15 31.678 < 2e-16 ***
## mnth2 27.84 157.94 0.176 0.860154
## mnth3 536.40 183.25 2.927 0.003560 **
## mnth4 568.70 276.37 2.058 0.040081 *
## mnth5 826.34 299.91 2.755 0.006056 **
## mnth6 803.29 317.11 2.533 0.011579 *
## mnth7 345.25 351.16 0.983 0.325945
## mnth8 772.56 335.71 2.301 0.021746 *
## mnth9 1295.03 295.74 4.379 1.43e-05 ***
## mnth10 532.99 265.42 2.008 0.045115 *
## mnth11 -270.15 251.88 -1.073 0.283931
## mnth12 -267.99 197.27 -1.359 0.174851
## holiday1 -598.33 212.00 -2.822 0.004940 **
## weekday1 184.39 119.75 1.540 0.124158
## weekday2 329.50 116.51 2.828 0.004851 **
## weekday3 465.04 117.61 3.954 8.68e-05 ***
## weekday4 394.69 118.12 3.341 0.000889 ***
## weekday5 438.63 117.46 3.734 0.000208 ***
## weekday6 484.16 114.73 4.220 2.85e-05 ***
## workingday1 NA NA NA NA
## weathersit2 -447.30 83.65 -5.347 1.31e-07 ***
## weathersit3 -2172.86 217.93 -9.970 < 2e-16 ***
## temp 553.24 2519.05 0.220 0.826245
## atemp 3534.70 2709.17 1.305 0.192529
## hum -1181.15 316.01 -3.738 0.000205 ***
## windspeed -2653.50 478.29 -5.548 4.48e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 754 on 555 degrees of freedom
## Multiple R-squared: 0.8526, Adjusted R-squared: 0.8451
## F-statistic: 114.6 on 28 and 555 DF, p-value: < 2.2e-16
## Loading required package: Matrix
## Warning: package 'Matrix' was built under R version 4.0.5
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
## Loaded glmnet 4.1-3
## Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
## collapsing to unique 'x' values
## [1] 120.4481
## [1] 646592.6
## (Intercept) (Intercept) season2 season3 season4 yr1
## 1665.863991 0.000000 681.410070 511.041832 1159.414262 1898.034466
## mnth2 mnth3 mnth4 mnth5 mnth6 mnth7
## -85.144376 338.809529 348.916074 585.086078 400.649468 -5.762268
## mnth8 mnth9 mnth10
## 405.070653 968.596860 603.388394
## Joining with `by = join_by(temp, hum, windspeed, cnt, season1, season2,
## season3, season4, mnth1, mnth2, mnth3, mnth4, mnth5, mnth6, mnth7, mnth8,
## mnth9, mnth10, mnth11, mnth12, holiday0, holiday1, weekday0, weekday1,
## weekday2, weekday3, weekday4, weekday5, weekday6, workingday0, workingday1,
## weathersit1, weathersit2, weathersit3)`
## [1] 0.7073779
## Joining with `by = join_by(temp, hum, windspeed, casual_percent, season1,
## season2, season3, season4, mnth1, mnth2, mnth3, mnth4, mnth5, mnth6, mnth7,
## mnth8, mnth9, mnth10, mnth11, mnth12, holiday0, holiday1, weekday0, weekday1,
## weekday2, weekday3, weekday4, weekday5, weekday6, workingday0, workingday1,
## weathersit1, weathersit2, weathersit3)`
## [1] 0.82432